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Abstract. We study convergence properties of pseudo- marginal Markov chain 
Monte Carlo algorithms [Andrieu and Roberts, Ann. Statist. 37 (2009) 697- 
725] . Wc find that the asymptotic variance of the pseudo- marginal algorithm is 
always at least as large as that of the marginal algorithm. We show that if the 
marginal chain is geometrically crgodic and the weights (normalised estimates 
of the target density) are uniformly bounded, then the pseudo-marginal chain is 
geometric. We consider also unbounded weight distributions and recover poly- 
nomial convergence rates in more specific cases, when the marginal algorithm 
is uniformly crgodic, an independent Metropolis-Hastings or a random-walk 
Metropolis targeting a super-exponential density with regular contours. Our 
results on geometric and polynomial convergence rates imply central limit the- 
orems. We also prove that under general conditions, the asymptotic variance 
of the pseudo-marginal algorithm converges to the asymptotic variance of the 
marginal algorithm if the accuracy of the estimators is increased. 

1. Introduction 

Assume that one is interested in sampling from a probability distribution tt 
defined on some measurable space (X, ;B(X)). One practical recipe to achieve 
this in complex scenarios consists of using Markov chain Monte Carlo (MCMC) 
methods, of which the Metropolis-Hastings update is the main workhorse [l^.[l9l|. 
We may write the Markov kernel related to a Metropolis-Hastings algorithm in 
the form 

(1) P{x, dy) := min {1, r{x, y)} q{x, dy) + 5^{dy)p{x), 
where r{x,y) is the Radon-Nikodym derivative as defined in [29[ 

(2) r(x,?/) := p{x) := 1 - [ mm{l,r{x,y)}q{x,dy), 

7i{dx)q{x,dy) J 

where q is the so-called proposal kernel (or proposal distribution). We follow the 
terminology of jsj and call this method the marginal algorithm. 

In some situations, the marginal algorithm cannot be implemented due to the 
intractability of the distribution tt. For example, assuming that vr and q have 
densities (also denoted vr and q) with respect to some cx-finite measure, it may be 
that 71 cannot be evaluated point-wise, and although r{x,y) may be well defined 
theoretically, it cannot be evaluated either. However in some situations unbiased 
non-negative estimates 7r(x) = Wxi^ix) may be available; that is, Wx ~ Qx{ ■ ) — 
and EfW^x] = 1 for any x G X (we will refer to Wx as a "weight" throughout the 
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paper). A naive idea may be to use such estimates in place of the true values in 
order to compute the acceptanceprobability. A remarkable property is that such 
an algorithm is in fact correct [3|. This can be seen by consider the following 
probability distribution 

(3) n{dx,dw) := iT{dx)7ix{dw) with 7r^(dty) := Qx{dw)w 

on the product space (X x W, B{X) x B(\N)) where W is a Borel subset of M_|_ and 
B{\N) are the Borel sets on W. Here n^ldw) is a probability measure for each 
X G X, and therefore vr is a marginal distribution of vr. 

It is possible to implement a Metropolis-Hasting algorithm targeting 7r{dx, dw) 
using a proposal kernel q{x, w; dy, du) := q{x, dy)Qy{du) by defining 

(4) P(x, W] dy, du) := min |l, r(x, q{x, dy)Qy{du) + 5^,^(dy, du)p{x, w), 
where the probability of rejection is given as 

p{x,w) := 1 - jj min |l, r(a;, ?/)^| q{x,dy)Qy{du). 



This is the pseudo-marginal algorithm [3[, which targets vr marginally since it is 
a marginal distribution of vr, and may be implemented in situations where the 
marginal algorithm may not. As a particular instance of the Metropolis-Hastings 
algorithm, the pseudo-marginal algorithm converges to vr under mild assumptions 



[e.g. |23|, and although it may be seen as a "noisy" version of the marginal algo- 
rithm, it is exact since it allows us to target the distribution of interest vr. The aim 
of this paper is to study some of the theoretical properties of such algorithms in 
terms of the properties of the weights and those of the marginal algorithm. More 
precisely we investigate the rate of convergence of the pseudo-marginal algorithm 
to equilibrium and characterise the approximation of the marginal algorithm by 
the pseudo-marginal algorithm in terms of the variability of their respective er- 
godic averages. 

The apparently abstract structure of the pseudo-marginal algorithm is in fact 
shared by several practical algorithms which have recently been proposed in order 
to sample from intractable distributions. The distribution of w is most often 
implicit, as we illustrate now with one of the simplest examples. Assume for 
simplicity that the space X is (a Borel subset of) M.'^ and B(X) consists of the 
Borel subsets of X and that both vr and q{x,-) (for any x G X) have densities 
with respect to the Lebesgue measure. Consider a situation where the target 
density is of the form 7r{x) = J n{x, z)dz where the integral cannot be computed 
analytically. One can suggest approximating this density with an importance 
sampling estimate of the integral, 

1 ^ 7r(x Z ) 

(5) Wx7r{x) = 7r{x) = — ' ^ ~ ■ ) independently, 

where is a probability density for each x G X. Note that it is in fact possible to 
consider unbiased estimators up to a normalising constant since such a constant 
cancels in the acceptance ratio of the pseudo-marginal algorithm, and without loss 
of generality we will assume this constant to be equal to one throughout. This set- 
ting was considered by Beaumont in the seminal paper 0] and various extensions 
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proposed in [3|. There are more involved applications of this idea. In the context 
of state-space models, it has been shown in [l| that Wx can be obtained with a 
particle filter — resulting in "particle MCMC" algorithms. In jsj it was shown how 
exact sampling methods can be used to carry out inference in discretely observed 
diffusion models for which the transition probability is intractable. See also the 
discussion [13] on the connection with pseudo-marginal MCMC and approximate 
Bayesian computation. 

We now summarise our main findings, which are of two different nature although 
some of their underpinnings and consequences related. 

Rates of convergence. In previous work (sf it has been shown that a pseudo- 
marginal chain is uniformly ergodic whenever the marginal algorithm targeting 
7r(x) is uniformly ergodic and the weights are bounded uniformly in x. It was 
also shown that geometric ergodicity is not possible as soon as the weights Wx 
are unbounded on a set of positive vr-probability. We extend the analysis of the 
convergence rates of the pseudo-marginal algorithms in several directions. 

In Section [3], we show that if the marginal chain is geometric and the weights are 
bounded uniformly in x, then the pseudo-marginal chain is geometrically ergodic. 
Our proof relies on lower bounding the spectral gap (Propositions [7] and [2]). 

In most scenarios of interest, the support of the weight distributions is un- 
bounded, implying that the corresponding pseudo-marginal algorithms cannot be 
geometric. We show that under various moment conditions on the weights, the 
pseudo-marginal algorithms have a specific sub-geometric rate of convergence. 
More precisely, in Section [5] assuming that the marginal algorithm is uniformly 
ergodic and the weight distributions are uniformly integrable we establish the ex- 
istence of a sub-geometric drift condition towards a small set (Proposition [22|l 
for an appropriate Lyapunov function. For example we show the existence of a 
polynomial drift condition (Corollary [25]) when the weight distributions satisfy 
moment bounds. This together with an additional mild assumption allows us to 
establish sub-geometric rates of convergence. 

In Section [6l we focus on the specific case where the marginal algorithm is the 
independent Metropolis-Hastings (IMH). We show that the existence of (not nec- 
essarily uniform) moment bounds for the weights lead to polynomial rates, while 
the existence of exponential moments leads to sub-exponential rates (Proposition 
[26] and its corollaries). In Section [7] we consider the popular random- walk Metrop- 
olis (RWM). Assumin g st andard tail conditions on vr which ensure the geometric 
ergodicity of a RWM [l3| and the existence of uniform moment bounds we show 
that the corresponding pseudo-marginal algorithm is polynomially ergodic (Theo- 
rem [32]). We extend this result to non- uniform moment bounds case (i.e. allowing 
them to grow in the tail of vr) in Theorem [40] 

Asymptotic variance. It is natural to compare the asymptotic performance 
of ergodic averages obtained from a marginal algorithm and its pseudo-marginal 
counterpart. One can in fact ask a more general question of practical relevance. 
In practice, it is often possible to choose the weight distributions Qx from a family 
{Qx^Ia^sn indexed by an accuracy parameter A^, as for example in (JS]). In such 
situations (dw) = {dw)w converge weakly to 6i{dw) as A^ — j- oo and one 
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may wonder if the asymptotic variance of the corresponding ergodic averages 
converge to that of the marginal algorithm. 

In Section |2] we first show that the pseudo-marginal and marginal algorithms are 
ordered both in terms of the mean acceptance probability (Corollary [3]) and the 
asymptotic variance (Theorem [6 1. The latter result relies on a generalisation of 
the argument due to Peskun 2J, |29| . This supports and generalises the empirical 
observation on toy examples that the pseudo-marginal algorithm cannot be more 
efficient than its marginal version. 

When the weights are uniformly bounded in x, we start Section H] with a sim- 
ple upper bound on the asymptotic variance of the pseudo-marginal algorithm 
(Corollary [TOj) from which it is straightforward to deduce that it converges to that 
of the marginal when the weight upper bound goes to one. We generalise this 
result to the situation where the weights are unbounded, but (dw) converges 
weakly to 6i{dw) as — ?■ oo (Theorem [T5ll . We also show how the sub-geometric 
ergodicity results proved earlier are essential to establish the conditions of this 
theorem in practice (Proposition [T^ . 

We conclude in Section [S] where we briefly discuss additional implications of our 
results such as the existence of central limit theorems, the possibility to compute 
quantitative expressions for the asymptotic variance and the analysis of generali- 
sations of pseudo-marginal algorithms. 

2. Ordering of the marginal and pseudo-marginal algorithms 

We first introduce some standard notation related to probability measures and 
Markov transition probabilities. For 11 a Markov kernel and n a probability mea- 
sure defined on some measurable space (E, i3(E)) and / a measurable real- valued 
function on E, we let for any x G E, n°/(x) := f{x), 

/i(/) := j /(x)/i(dx) and U^f{x) := j U{x , dy)U^-' f (y) for n > 1. 

We will also denote the inner product between two real-valued functions / and g 
on E as {f,g)^ ■= J f{x)g{x)^{dx) and the associated norm ||/||^ := (f,/)]/"^. 

We start by a simple lemma, which plays a key role in the ordering of the 
marginal and the pseudo-marginal algorithms. 

Lemma 1. For any x,y we have 

JJ Qx{dw)wQy{du) min ^^l,r{x,y)^^ < min{l, r(x, y)}. 

Proof. Notice that t H- min{l,t} is a concave function. Therefore, one can apply 
Jensen's inequality, with the probability measure Qx{dw)wQy{du), to get the 
desired inequality. □ 

In order to facilitate the comparison of P and P we follow ^ and introduce 
an auxiliary transition probability P which is defined on the same space as the 
pseudo-marginal kernel P and is reversible with respect to tt, 

(6) P{x,w;dy,du) := q{x,dy)7iy{du)min{l,r{x,y)} + 6x^^{dy,du)p{x). 
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Application of Lemma [1] leads to the generic result below, which in turn implies 
an order between the expected acceptance rates (Corollary [3]) and the asymptotic 
variances (Theorem [6]) of the marginal and pseudo-marginal algorithms. 

Proposition 2. Let g : ^ [0, oo) be a symmetric measurable function, that is 
such that g{x, y) = g{y, x) for all x,y E X. Define 

'^pia) ■= / 7r(da;,dw;) / q{x,dy)ny{du)mm{l,r{x,y)}g{x,y) 

Then we have Ap{g) > ^p{g) and whenever these quantities are finite, 

^p{g) - ^p{g) < j Tr{dx)Q,jc{dw)\w -l\ j q{x,dy) mm{l, r{x,y)}g{x,y). 

Proof. Denote a{x,y,u,w) := min{l, r(x, y)}— min |l, r(x, y);^}. Since / 'n'yidu) = 
1 = / Qy{du), we may write for a bounded function g 



Ap(fi') - Ap{g) = j TT{dx)q{x,dy)g{x,y) j Q,^{dw)wQy{du)a{x,y,u,w) > 0, 

where the inequality is a consequence of Lemma [U The general case follows by a 
truncation argument. 

For the second bound, note that min |l, r(a:, y)^} > min{l, r(x, y)} min {l, 
and 2 minju, w} = u + w — \u — w\, and compute 



Ap{g)> J 7r{dx)q{x,dy)Qr,{dw)Qy{du)mm{l,r{x,y)}mm{u,w}g{x,y) 

= Ap(5()-^ J ■n:{dx)q{x,dy)Q^{dw)Qy{du)min{l,r{x,y)}\u-w\g{x,y) 
>Ap{.9)-j 'n{dx)Q,^{dw)\l- w\ j q{x,dy) mm{l,r{x,y)}g{x,y), 

where the last inequality follows by the bound |m — w| < |1— — w|, the 
symmetry of g{x,y) and because 

7r(dx)g(x, dy) min{l, r(x, y)} = 'Jr{dy)q{y, dx) mm{l, r{y, x)}. □ 

Corollary 3. Let us denote the expected acceptance rates of the marginal and the 
pseudo-marginal algorithms as 

ap := J 7r(dx) J q{x,dy)mm{l,r(x,y)}, 

ap := J 7r{dx,dw) J q{x,dy)Qy{du)mm ^^l,r{x,y)^^ , 
respectively. Then we have 
0<ap — ap< J |w — 1| 7r(dx) (l — p(x))(5z(dw) < J \w — l\7i{dx)Qx{dw). 
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Proof. Observe first that 

ap := J TT{dx,dw) J q{x, dy)Qy{du) mm{l,r{x,y)} = ap. 

Applying then Proposition |2] with g = 1 imphes 

0<ap — ap< J \w — l\n{dx)[l — p{x))Qx{dw). 

The last inequality follows because p{x) G [0, 1] for all x G X. □ 

Remark 4. Corollary [3] implies also the following bounds 

. ap(sup^.gx/<5x(dtf)|l - 
''''-^a'J^{fn{dx)Qx{dw)\l-w\^f' 

where p,q > 1 with 1/p + 1/q = 1. 



We now define the notion of asymptotic variance for scaled ergodic averages of 
a Markov chain. 

Definition 5. Let 11 be a reversible Markov kernel with invariant distribution 
/i defined on some measurable space (E, i3(E)), and denote by {Xi:)k>o the cor- 
responding Markov chain at stationarity, that is such that Xq ~ /i. Suppose 
/ : E — )■ M satisfies < oo. The asymptotic variance of / under 11 is defined 

as 

1 / " \^ 
(7) var(/,n) := lim -E V /(X^) - G [0,oo]. 

^ k=l ^ 

Whenever the integrated autocorrelation time 

^/,n:=l + 2> — where var^ / := - /i / ) , 

^ var^(/) 

exists and is finite, then var(/, 11) = r(/, n)var^(/) G [0, oo). 

Lemma ST] in Appendix |X] shows that the limit in ([7]) always exists (but may 
be infinite) and proves the relation between r(/, H) and var(/, H). We now show 
that a pseudo-marginal algorithm is always dominated by its associated marginal 
algorithm in terms of asymptotic variance. The result can be regarded as an 
extension of Peskun's approach jl^, 29|. We point out in the proof what makes 
the result not straightforward. 

Theorem 6. Assume / : X — M satisfies vr(/^) < oo. Denote var(/, P) = 
var(/, P) where f{x, ■ ) = f{x). 

(i) Then, var(/, P) > var(/, P). 
(a) More specifically, 

var(/, P) > var(/, P) + liminf [Ap{gx) - Ap{gx)] 

A-i-l- 

where Ap{g\) and Ap{gx) are defined in Proposition \E and gx{x,y) : = 
[(t>x{x) - 4>x{y)f with <Px{x) := Er=o AlPV(x) - 7r(/)] for X G [0, 1). 
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Proof. Our proof is inspired by the proof of Tierney |29|, Theorem 4] but we 
cannot use his argument directly because Proposition |2] does not apply to functions 
depending also on u and w. Observe first from the definition of P that a Markov 
chain Wn)n>o with the kernel P and with [Xq, Wo) ~ vr coincides marginally 
with the marginal chain, that is, {Xn)n>o following P with Xq ~ tt and {Xn)n>o 
have the same distribution. Therefore, var(/, P) = var(/, P). We denote 

fix) := fix) - nif) e LliX, vr) := {/ : X ^ M : 7r(/) = 0, 7r(f ) < oo}, 

and with a slight abuse of notation define fix,w) := fix) for all ix,w) G X x W. 
Notice that / G Lq(X x W, vf). For A G [0, 1), we define the auxiliary quantities 

var,(/, H) = (/, (/ - XH)-\I + XH)f). , 

for any Markov kernel H reversible with respect to tt, where / stands for the 
identity operator. We note that from Lemma |16] in Appendix |X] the quantity 
vai\if,H) is well-defined and that from Lemma HTJ it is sufficient to show that 
vaxxif, P) < varA(/, P) holds for all A G [0, 1) in order to establish 

Using the notation of Lemma HH] with Pi = P and P2 = P, we can write 

var,(/, P) - var,(/, P) = (/, A,(l)/> . - (/, A,(0)/> . 

(/,A',(/3)/>^d/3 



^0 



Note that if P and P would satisfy Peskun's order, then the second line is sufficient 
to conclude 2^ . We show now that both terms on the right hand side of the last 
line are non- negative. 

First observe that by Lemma | 



(/, A'Mf), = 2A (/, (/ - \P)^\P - - AP)-V">, = 2A (0,, (P - P)0,>^ , 

due to the reversibility of P, where 0a •= il — ^P)~^ f = 'YlT=o'^^-^^ f well- 
defined by Lemma l46l We notice that (j)xix,w) = (pxix), and a straightforward 
calculation (cf. (^) shows that 



(0A, iP - P)(t>\)i = j 7i"(dx, dw)(l)xix)(l)xiy) (P(a;, w; dy, du) - P(a;, w; dy, du)) 

= \j (0A(a;) -0A(2/))^7f(dx,dit')(P(x,w;;dy,du) - P(x,u;;di/,du)) 

= \[^p{9x)-^pi9x)]. 

with gxix.y) = (0a (x) - 0A(z/))^ and Proposition [2] yields (/, A^(0)/)- > 0. We 
therefore turn our attention to 

(/, ^a(7)/>, = 4A2 (/, (/ - \H,)'\P -P)il- \H,)'\P -P)il- \H,)~'f)^ 
= AX'{^,iI-XH,)~'^). 
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where (p := {P — P){I — XH^)~^f, by the reversibihty of P and P and the inter- 
polated kernel = P + ^{P — P)- It is easy to check that (f & Lq(X x W, vr), so 
we may conclude (ji]) by applying Lemma W7\ implying {(p, (J — XH^)~^(p)- > 0. 

The specific lower bound (jn]) follows from ([8]) because the first term is always 
non-negative. □ 

3. Geometric ergodicity when the marginal algorithm is 
geometric and and the weights bounded 

We consider now an order between the spectral gaps of the pseudo-marginal 
kernel P and the auxiliary kernel P in ([6]). Then, particularly, we find that if w 
is always bounded from above by w G [l,oo), that is, W = fO,w], and P has a 
non-zero spectral gap (i.e. P is geometrically ergodic; see [26|, Proposition 2.1]), 
then P has a non-zero spectral gap as well. We will also examine the asymptotic 
variance constants using the spectral gap order. 

Suppose / : X X W — 7- M is integrable with respect to vr. We denote in this 
section the function centred with respect to w as 

POO 

f{x,w):=f{x,w)-fo{x) where /o(x) := vr^ (/(x, ■ )) = / f{x,w)7i^{dw). 

Jo 

The Dirichlet form related to a Markov kernel 11 with invariant distribution fi and 
a function g is given as 

(9) Suig) ■■= {g,il-ll)g)^ = ^J fi{dx)Uix,dyMx) - g 
where / is an identity operator. The spectral gap is defined through 

(10) Gap(n):= inf M^L = inf £^{g), 

g;var^(9)>0 Vaif,[g) 9.fi[g)=0, ||s|Im=1 

where vaifj_{g) is given in Definition |5l 

Proposition 7. The spectral gap of P defined in (Q satisfies 

Gap(P) A (l - esssupp(x)) < Gap(P) < Gap(P), 

where the essential supremum is with respect to vr. 
Proof. Let / : X X W ^ M with 7f(/) = and ||/||^ = 1 and compute 

£p{f) - Epifo) = j 'K{dx)'K^{dw)q{x,dy)Tiy{du) min{l, r(x, y)} 

([/(x,«;)-/(y,n)f-[/o(x)-/o(y)]^) 

7r(dx)7r^(du;)g(x,di/)min{l,r(x,?/)}[/^(x,if;) - fl{x)] 

7r(dx)7r^(dw)[/(x,w) - /o(x)]^(l - p{x)). 



2 



In other words, 



(11) £p{f) = £p{fo) + j 'n{dx)7iMw){l- p{x))P{x,w). 
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If var7r(/o) > 0, then we have by f fTT]) 

(12) Sp{f) > Gap(P)var^(/o) + J 7r{dx)7r^{dw){l - p{x))p{x,w) 

> Gap(P)(l - n{p)) + (1 - esssnpp{x))n{p), 

where we have used that 1 = var^f(/) = var^(/o) + 7f(/^) by the variance de- 
composition identity. We notice that (fT2l) holds also when var7r(/o) = 0. We 
conclude with the bound £p{f) > Gap(P) A (l — esssup^jgx p{x)) which holds for 
all ll/lls- = 1 with 7r(/) = 0, implying the first inequality. 

For the second inequality, note that if f{x,w) = fo{x) for all (x, w) e X x 
W, then 7r(/o) = and Trifo) = 1. Consequently, Sp{f) = Sp{fo). Therefore, 
Gap(P) < Gap(P). □ 

Remark 8. In the case where vr is not concentrated on points, that is, 7r({a;}) = 
for all X G X, the statement of Proposition [7] simplifies to Gap(P) = Gap(P), 
because then 1 — esssup^gx P(^) ^ Gap(P) by Lemma 149) ([n]) in Appendix [Bl 

Proposition 9. Suppose that there exists a constant w G [l,C)o) such that 

(13) <5x([0, w]) = 1 for n-almost every x G X. 
Then, the Dirichlet form of the pseudo-marginal algorithm satisfies 

Spif) > w-'Sp{f), 
for any function with vr(/^) < oo, implying Gap(P) > Gap(P). 
Proof. Because min{l, ab} > min{l, a} min{l, b} for all a,b > 0, we have 

2^p(/) = J 7f (dx, dw)q{x, dy)Qy{du) min |l, r(x, ?/)^| [/(x, w) - f{y, u)f 

> I 7r(dx,dw)g(x,d?/)7ry(dM)min{l,r(x,?/)}min|-,— I [/(x,w;) - /(?/,m)]^ 

Ju>0 ^'^ ■' 

> 2w'^£p{f). □ 

Corollary 10. Assume Gap(P) > and there exists some w G [l,oo) such that 
f|T3l) holds. Let (7 : X — )■ M satisfying ir^g'^) < 00, then, the asymptotic variances 
(Definitionl^ satisfy 

vai{g, P) < vaj:{g, P) < wvai{g, P) + {w — l)var^(5f). 

where vaiig, P) := var(^, P) with g{x, ■ ) = g{x). 

Proof. Proposition [9] implies (/, (/ — P)f)j^ > (/, w~^{I — P)f)^ for all functions 
7f(/^) < 00, and Lemma HH] in Appendix iBl implies 

{~g,iI-P)-'~g)^<w{~g,iI-P)-'~g)^. 

Now note that var^j-(^) = vaij^^g) and var(^,P) = vaT{g,P) hold because P and 
P coincide marginally; see the proof of Theorem El The above, together with 
Theorem El imply, 

var^(5() + var(5(, P) < vars(^) + var(^, P) < w [vai^^g) + var(^, P)) , 

and allows us to conclude. □ 
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Remark 11. From the proof of Proposition [9|, one observes that in fact 

Gap(P) > Gap(P) > uriGap(P), 

where P is the Markov kernel with the proposal q{x, dy)Qy{du) and the acceptance 
probability min{l, r{x, y)} min{l, u/w} reversible with respect to ft. This implies, 
repeating the arguments in the proof of Corollary [TUl that var(/, P) < var(/, P) 
for all 7t{P) < oo. 

Next we show that the boundedness of the support of the weight distributions 
Qx for essentially all x G X is a necessary condition for geometric ergodicity of 
the pseudo- marginal algorithm. The result is similar to Theorem 8 in |3|, but its 
proof is different and the statement more explicit. 

Proposition 12. // the pseudo-marginal kernel P has a non-zero spectral gap, 
then there exists a function w : X — )• [l,oo) such that Qx{[0,w{x)]) = 1 for 
TT-a.e. X G X. 

Proof. We proceed by contradiction. Assume that there exists a set A G B{X) 
with 7r{A) > such that Qx{[[0,w]) < 1 for all x G A and all w G [l,oo). Fix 
e > and let w : A ^ [1, oo) be a measurable function such that 1 — p{x, w) < e 
for all X G A and w > w{x), and such that 7t{A) G (0, 1/2) where A := {(x, w) G 
XxW : X E A,w > w{x)}. We now apply Lemma H9] fp]) in Appendix [B] with the 
set i, to conclude that Gap(P) < (l + (1 - 7r{A))-^)e < 3e. □ 

4. Convergence of the asymptotic variance 

In standard applications of the pseudo-marginal algorithm, one typically se- 
lects Qx from a family of possible proposal distributions indexed by some 
precision parameter which reflects the concentration of on 1. In most rel- 
evant scenarios we are aware of, A^ G N corresponds to the number of samples, 
particles or iterates of an algorithm used to compute an unbiased estimator of 
the density value, as exemplified in ([5]). It should be clear that this is not a 
restriction. Hereafter, we denote the pseudo-marginal kernels and the invariant 
measures associated with as P/v and n^, respectively. 

It is easy to see that if for all x G X, {dw)w — )■ 6i{dw) as A^ — >■ oo weakly, 
then 7r7v(dx,dw) — )■ 7r(dx)5i(dw) weakly, suggesting that a pseudo-marginal algo- 
rithm with invariant distribution tttv may become similar to the marginal algo- 
rithm with invariant distribution vr as A^ — ?■ oo. As pointed out earlier, whenever 
Wx is not bounded uniformly, a pseudo-marginal algorithm cannot be geometric, 
although its marginal algorithm may be. In fact it was shown in a Remark 1] 
that even in situations where the weights are uniformly bounded and the pseudo- 
marginal algorithm is uniformly geometric, increasing A^ may not improve the 
rate of convergence of the algorithm, i.e. there is not convergence in terms of rate 
of convergence. 

In this section we however show that in many situations such a convergence 
takes place in terms of the asymptotic variance, or equivalently, the integrated 
autocorrelation time; see Definition [51 More precisely, we show here that under 
simple conditions vaT{g, P/v) — var((7, P) as A^ — ?■ oo. We start with a very simple 
result, which is a direct consequence of Corollary [TUl 
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Proposition 13. Suppose that the marginal kernel P has a non-zero spectral gap 
and the weight distributions are hounded uniformly in x & X by G (1,cxd), 
that is, Q^{[0,w'^]) = 1 for all x e X and N > No for some No G N, and 
\imM^r^,w^ = 1. Then, Hiiitv-^oo var((7, P/v) = vaj:{g,P) for any (7 : X — t- R with 



TT{g^) < 00. 

Proof. The result is direct consequence of Corollary [TOl □ 

We now extend this result to situations where the distributions {Q;^}7vgm niay 
have an unbounded support, and therefore {-PArjArgN may not be geometrically 
ergodic. We formulate our result in terms of the following technical condition 
assuming uniform convergence of the integrated autocorrelation series. We will 
return to this assumption towards the end of this section and show that it can be 
checked in practice with for example Lyapunov type drift conditions (see Propo- 
sition [191). 



Condition 14. For (7 : X — )■ R, suppose that the integrated autocorrelation time 
T{g,P) (Definition [5]) is well-defined and finite. Denote by {X^)k>o the Markov 
chain with initial distribution ttn and kernel Pn- Assume that there exists a 
constant iVo < 00 such that 



lim sup 

n->oo jv>Aro 



where g = g — T^{g)- 



The main result of this section is 

Theorem 15. Assume that (7 : X — t- 
holds for g. Suppose also that. 



satisfies 7r{\g\'^^ ) < 00 and Condition I4 



(14) 



lim 

N^oc 



Q^idw)\l 



w\ 



for all a; G X. 



Then, lim ^ ^ ^^t^ {g, Pn) = var((7,P). 

Proof. If var^(5f) = 0, the claim is trivial. If var^((7) > 0, our conditions imply 
that the autocorrelation times exist and are finite for both the marginal kernel P 
and the pseudo-marginal kernels Pn for N > Nq; this follows from the finiteness of 
them terms in the autocorrelation series ensured by the Cauchy- Schwartz inequal- 
ity, and Condition [131 Therefore, without loss of generality, we prove the claim for 
autocorrelation times T{g, P/v) — )■ T{g, P) for a function g with 7i]\r{g) = n^g) = 



and ttn{ 



TX 



1. 



Consider the Markov kernels P/v defined as in ([6|) with and (dw) := 
{dw)w. Denote by {Xj^ , W^)k>o the corresponding stationary Markov chain 



with (Xq^Wq) ~ ttn- Denote similarly {X^ ,W^)k>o the stationary Markov 
chain corresponding to the kernel P/v with {Xq , W^) ~ n^. Notice that P/v and 
ttn coincide marginally with P and vr, respectively, that is {X^)k>o has the same 
distribution as that of the stationary marginal chain {Xk)k>o with kernel P and 
such that Xo ~ tt. 

Choose e G (0, 1) and let no = no{e) < 00 be such that for all N > Nq 



(15) 



J2 mXo'')g{x^)] 



< e and 



^ E[g{Xo)g{X,)] 



k=nQ 



< e, 
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where the existence of no follows from Condition [TH We have for N > Nq 

no-l 



\T{g,P)-rig,Pj,)\<Ae + 2 



k=l 



In order to control the last term, we consider a coupling argument. Denote 
q '■= (2 + 6)/6 G (1,00). Lemma applied with e = enQ'^~^/2 implies the 
existence of A''i < 00 and a set C* G i3(X) x i3(W) such that for all N > Ni, 

||-PAr(x, W] ■) — Pn{x, w; ■ ) II < en^''^^ /2 for all {x, w) G C. 

Lemma ED in_ Appendix O applied to {X^ ,W^)o<k<no-i and {X^ ,W^)o<k<no-i 
with the set C shows that the laws of these processes, /i and respectively, satisfy 
the following total variation inequality for all N > Ni, 



fi\\ <2nonN{C^) +no sup \\P^ {x,w; ■) - {x,w; ■)\\ <2eno''. 

{x,w)ec 



Therefore, for all > A'^i, there exists a probability space (r^^r, Pat, J-tv) where 
both (X^, 'W^^)o<fc<no-i and (X^, W^^)o<fc<no-i are defined, and the set 



satisfies Fp^{A^^) = i||/i — fi\\ < erig'^ [e.g. [isl. Theorem 5.2]. Denote p = 1 + S/2, 
and note that + = 1. Now for N > Ni, 

Y,E[g{X,'^)g{X^)]-E[g{X,'^)g{X^ 



k=l 



E 



< 



< 



N 



N 



N 



no-l 



E 9{Xo'')9{X^) - 9{X^)9{X, 



>- fc=i 



A)) 



E 



N 



rtu-1 



Y.9{X^)9{X^)-9{X^)9{X^) 



k=l 



{A%)y/%no - 1) max {mXo'')9{X^)\T" + (E|^7(Xo)^(X,)r) 

i<.K<;-nQ— i 



<2e'/^'{7r{\g\'+')Y^^''\ 
by the Holder, Minkowski and Cauchy-Schwarz inequalities. 



□ 



Let /ii and /i2 be two probability distributions on the space (E, ,B(E)). We define 
the total variation distance 

||/^l-/U2|| := sup |/il(/)-Ai2(/)| = 2 sup |;Ui(/)-/i2(/)| = 2 sup \lJ,i{A)-IJ,2{A)\. 

|/|<i o</<i Aee(E) 

Lemma 16. Assume that f|T^ is satisfied. Then, for any e > there exists a 
Ni<oo and a set C E B{X) x B{\N) such that for all N > Ni, 

\\Pn{x, w; ■) — -P/v(x, w; ■ )|| < e for all {x, w) G C. 
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Proof. Choose e > and let w := 1 + e/8. The dominated convergence theorem 
together with f|T^ imphes for all x G X, 



(16) 
(17) 



hm / q{xAy)Q^{du)\l 



U\ 



lim 7rf([u; ^ ,w\) 





1. 



By Egorov's theorem, there exists a set C G -B(X) such that vr(C^) < e/2 and the 
convergence in both fll6p and 0171) is uniform in x. 

For any x G X, any w > and any set A G i3(X) x B{\N), 



\Pn{x,w;A) - Pn{x,w;A)\ 



< 2 

< 2 

< 2 



g(x,dy)g;^(dn) 
q{x,dy)Q^{du) 
qix,dy)Q^idu) 



min{l, r(x, y)}u — min < 1, r(x, y) 



min{l, r(x, y)} — min < 1, r(x, y) 



M + 





1 




< 2 


1 






w 





4 / q{x,dy)Q'^{du)\l 



u 
w 

- u\ 



where the third inequality follows by Lemma 1501 in Appendix [Bl Therefore, letting 
C := C X [w~^, w], we can bound the total variation by 



sup \\Pn{x,w; ■) - Pn{x,w; ■)\\ < - + 8sup / q{x,dy)Qy {du)\l - u\. 

Because Xim.^^aoT^NiP'^) = 7r(C''), we may conclude by choosing A''i < oo such 
that sup^gt7/?(^'d2/)Qj'(dn)|l - n| < e/16 and ^n{C^) < e for all iV > A^i. □ 

Remark 17. With additional assumptions in Condition [H] and (I14p on the rates 
of convergence, one could obtain a rate of convergence in Theorem [T5| that is find 
{r{n)}nm such that 

|var(5(, Pn) - vai{g, P)\ < r{N) as oo, 

by going through the proofs of Theorem [15] and Lemma [161 

We now provide sufficient conditions implying the conditions of Theorem [T5l 
Condition [141 which essentially require quantitative bounds on the ergodic be- 
haviour of the pseudo-marginal Markov chains. Our results rely on polynomial 
drift conditions which we establish for some standard algorithms in Sections [HI and 
[3 Weaker drift conditions can be shown to imply Condition [TH [e.g. Q, 0], but we 
do not detail this here in order to keep presentation simple. 

Condition 18. There exists a function V : X x \N ^ [1, oo), a set C G B{X) x 
B(\N) with sup(^ „,)g^ V{x, w) < oo, constants a G (0, 1], 6y G [0, oo), ey G (0, oo) 
and Nq < oo, such that for all N > Nq 

PnV{x, w) < V{x, w) - ev^ix, w) + byl {(x, w) G C} for all a; G X, w; G W, 
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and for any v G [1, oo), there exists probability measures {v^}n>No and a constant 
e (0, 1], such that for all > iVo, 

PAr(x, w] ■)> eyU^{ ■ ) for all {x,w) eXx\N such that V{x, w) < v. 

Proposition 19. Assume ConditionlTSi holds for the pseudo-marginal kernels Pm. 
and that for some A G [0, 1) and n G [0, 1), 



\9{x)\ ^ «(1-A) 
sup < oo where a^A •= ^ ; — 



sup 7f;v((|^^| + l)^'"^") <00, 
N>No 



then Condition VT^ holds 



Proof. From the assumptions, there exists a finite constant R such that for all 
N >No and any (x, w), (x', w;') G X x W, 

Y,rmPN9i^,w)-PN9i^',w')\ < R\\9\\v-'^a{V'~''''{x,w) + V'-'^^ix' ,w') - 1), 

A;>0 

q(1-A)(1-k) 1—1 1—1 

where r{k) := (fc + 1) — )■ oo as /c — )■ oo |J, Corollary 12]; see also [2|, 
Proposition 3.4]. Note that we may write 



\^[x,w}[9{^, 



P^g{x,w)- / nN{,dy,du)P^g{y,u] 



< / nN{dy,du)\P^g{x,w) - P^g{y,u) 



Therefore, we have for n > 



k=n 



k=n 



< 



\g\\v°''^,^ r~ 



r(n) 



[^N{\g\V'~''') +7Ti\g\)n^iV'^'-)]. □ 



5. Sub-geometric ergodicity with uniformly ergodic marginal 

algorithm 

We consider the situation where the marginal algorithm is uniformly ergodic. 
This often corresponds to scenarios where the state space X C M"^ is compact. 
It turns out that when the weight distributions {Q^jxex do not have bounded 
supports but are uniformly integrable, then the corresponding pseudo-marginal 
algorithm satisfies a sub-geometric drift condition towards a set C := X x (0,w] 
for some w G (1, oo). Provided the marginal algorithm satisfies a practically mild 
additional condition in f|T8|) . the set C is guaranteed to be small for the pseudo- 
marginal chain. 

We start by assuming uniform integrability in a form given by the de la Vallee- 
Poussin theorem [e.g. |20|, p. 19 T22]. This allows us to quantify the strength of 
the sub-geometric drift in a convenient way, for example indicating that moment 
conditions imply polynomial drifts and consequently polynomial ergodicity. 
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Condition 20. There exists a non- decreasing convex function (p : [0, oo) — > [1, oo) 
satisfying 

liminf = oo and My/ := sup ( (j){w)Qx{dw) < oo. 
t xex J 

We record a simple implication of Condition [201 

Lemma 21. Assume Condition {Wi holds. Then, there exists a function a{w) : 
[0, oo) — [0, oo) depending only on Mw and cj) such that 

sup / uQy{du) < a{w) and lim a{w) = 0. 

yGX Ju>w ""^-^ 
Proof. For any function / : [0, oo) — ?■ [0, oo) non- decreasing in [w, oo), we have 

/ uQy{du) < / u——Qy{du). 

Ju>w J JV^) 

The function f{w) := (f){w)/w is non- decreasing for w sufficiently large, therefore 



sup / uQy{du) < Mw—j—T ='■ a{w) "'~^°"> 0. □ 
yex Ju>w n^) 

The next result estabhshes a drift away from large values of w for the pseudo- 
marginal chain, given that the marginal algorithm has an acceptance probability 
uniformly bounded away from zero. All uniformly (and geometrically) ergodic 
Markov chains satisfy this property j27l . Proposition 5.1]. 

Proposition 22. Suppose that the one-step expected acceptance probability of the 
marginal algorithm is bounded away from zero, 



"0 



inf / g(a;, dy) min{l, r(x, y)} > 0, 



and Condition l2U\ holds. 

Then, there exist constants 6 > and w G (1, oo) such that 

V(w) 

PV(x, w) < V(w) - 6—-^I {w G [w, oo)} + Mwl {w G (0, w)} . 

w 

where V{x,w) := V{w) := 4>{w). The constants 6 and w can be chosen to depend 
on ao, (f> and Mw only. 

Proof. We can estimate 
PV{x,w) - V{w) 

g(x,d?/)(5,y(du)min |l,r(x,?/)^| [0(u) - (f){w)] 
< Mw - J J q{x,dy)Qy{du) min |l, r(x, ?/)^|l {m < w} [0(w) - 0(m)] 

<Mw-<j){w) I g(x,di/)min{l,r(x,i/)} / Qy{du)- l-'^^^^ 

because min{l, ah} > min{l, a} min{l, b} for all a,b > 0. The convexity of </> im- 
plies 20(tf;/2) < 1 + </)(«;), and therefore lim sup 0(tf/2)/0(u;) < 1/2. Because 
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Iu<w/2Qyi^'^)'^ = 1 ~ /«>t«/2^y('^")"' ^PP^y Lemma EU Now, for any 

So e (0, Q!o/2), there exists wq G (1, oo) such that 

PV{x, w) - V{w) <Mw- 5o^^ for all w G [wq, oo). 

w 

The claim follows by taking w G [t^o, oo) sufficiently large such that (f){w)/w > 
M\y/So for all w G [w, oo). □ 

In practice, Condition [201 is often verified for moments, that is, (f>{w) = w^. We 
record the following corollary to highlight the straightforward connection of (3 to 
the polynomial drift rate. 

Corollary 23. Suppose the conditions of Proposition l2E hold with (f){w) = w'^ + 1 
for some /3 > 1. Then, the pseudo-marginal kernel satisfies the drift condition 

PV{x, w) <V{w)- 5V^ (w) + bvl {w G (0, w) } , 
where V{w) := ty^ + 1 and by := Mw + 6V^{w). 

Proof. Follows from Proposition [2^observing that w < = V{wY^^. □ 

Proposition [22] and Corollary [23] establish a drift towards the set X x (0, w]. We 
are left with showing that the set (0, w] is small. 

Lemma 24. Suppose there exists e > 0, an integer n G [1, oo) and a prob- 
ability measure v on (X, i5(X)) such that denoting the (sub-probability) kernel 
Pacc{x,A) := J^q{x,dy)mm{l,r{x,y)} then for anyAeB{X), 

(18) P^Jx, A) > eu{A) for all x G X. 

Then, there exists Wq G (1,oo), e > and a probability measure u on (X x 
W, B{X) X B{\N)) such that for all w G [wo, oo), 

P"(x, w; ■) > — i/( ■ ) for all (x, w) G X x (0, w] . 

Proof. Choose t^o > 1 sufficiently large so that ew '■= infj^gx / Qy{du) min{w;o, u} > 
0; such wq exists due to Lemma UT\ because 



Qy{du) m.m{wo,u} > / Qy{du)u = 1 — / Qy{du)u. 

J U<Wq J u>wo 

We may write for A x B e B{X) x B{\N) and for w G (0, w], 



Pi.,.-,A,B)> /,(..d,)/Q„(d.)m,„|l.r(..,)^ 

J A J B y W 

> I g(x,d?/)min{l,r(a;,?/)} / Qy(du) min < 1, — 

J A Jb I ^ 

> - [ P^c{x,dy)Pw{y,B), 
w J 



CONVERGENCE PROPERTIES OF PSEUDO-MARGINAL MCMC 



17 



where Pwiy.B) = f^Qy{du)mm{wo,u}. We deduce recursively that 





i>o{A X B). 



We may take u{A x B) = uq{Ax B) /uq{X x W) and e = ei>o(X x W) > 0. □ 

Remark 25. The condition in flTH]) is more stringent than assuming P uniformly 
ergodic. However, it is the most common way to establish the ra-step minori- 
sation condition P"(x, ■) > ez^( ■ ) in practice, which holds if and only if P is 
uniformly ergodic. In the case of a continuous state-space X where q{x, {y}) = 
and I'dy}) = for all x,y eX and n = 1, the condition in (ITSi) is in fact equivalent 
to P(x, ■)>eu{-). 



6. Sub-geometric ergodicity with an IMH as marginal algorithm 

The independent Metropolis-Hastings (IMH) algorithm is a specific case of the 
Metropolis-Hastings in ([1]) corresponding to a proposal q{x, dy) = q{dy) for all 
a; G X, such that n <^ q. The IMH can often be made uniformly ergodic by 
choosing q to have heavier tails than tt, in which case the results in Section [5] 
are applicable. However, the uniformity assumptions required in Section [5] can be 
relaxed. Firstly, we may consider situations where the marginal IMH algorithm is 
not uniformly ergodic, but for example polynomially ergodic. Secondly, uniform 
integrability of the weight distributions {Qx}x£X is not a requirement. The results 
of this section may be relevant for example to the analysis of the Particle IMH-EM 
algorithm presented in [5|. Our results are inspired by [l^] establishing polynomial 
ergodicity and [lH exploring other sub-geometric rates for the IMH. 

Proposition 26. Denote fi{x) := 7r(dx)/g(dx). Suppose that there exists a strictly 
increasing 4> '■ (0, oo) — )■ [1, oo) with liminft^oo 4>{'t)/'t > 0, such that 



Then, there exists constants M, c, e G (0, oo) and a probability measure u on 



(19) 




(X X W, BiX) X i3(W)) such that for all {x,w)eXx\N 



(20) 



PV{x, w) < V{x, w) — c 



V{x, w) 



fi{x)w > M 



(l)~^{V{x,w))' 



(21) 



fi{x)w < M, 



and I'iV) < oo, where V{x,w) := (f)[jj{x)w). 
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Proof. Denote A^,^ := {(y, m) G X x W : > l} and i?^,^ := AI^^ and write 

PVi.,.) ^ ^.(d.)g,(dn) + /^^^ ^^.(d.)g,(dn) 
+ (l-^).(d.)Q,(d.) 

< -7^ [ Hdy, du)V{y, u) + V{x, w) (l - —^j^f^) , 

because n{y)u > n{x)w on A^^w The first term on the right vanishes and 
T^{Rx,w) — >■ 1 as fi{x)w — )■ oo, and hminf^^oo m/0~^('u) > 0, implying (l20i) . 
For (l2Ti) . observe that for fi{x)w < M, 

P{x,w;B) > |^min{^,-^}^(d|/,d«) =: z>(S), 

and we can take e = i>(Xx W) and = e^^u, for which f[T^ imphes i^iV) < oo. □ 
Corollary 27. If for some 7 > 0, 

y TT (dx, dw) exp [(/i(a;)ty)'''] < cxd, 

t/ien i/iere exzsi constants M, c, cy G (0, 00) swc/i i/iai for fi{x)w > M , we have 
the drift 

PV{x, w) < V{x, w) — CK,{y{x, w)) , 
where V{x,w) = exp )'^) and K{t) = t{\ogt)~^/'^ . 

Proof. Proposition [261 applied with (j){t) = exp(t'^). □ 

The type of drift in Corollary [271 implies faster than polynomial sub-exponential 
rates of convergence; see for example [lol |. 

Corollary 28. If for some /3 > 1 

/*(dx.du>)(Mx)u.)'<oo. 

then there exist constants M, c, cy G (0, 00) such that for fi{x)w > M, we have 
the polynomial drift 

PV{x, w) < V{x, w) - c1/"(x, w), 
where V{x, w) = (/i(x)w)^ + 1 and a = 1 — 1/ (3 . 

Proof. Proposition applied with (f){t) = + 1 implies that for all fi{x)w > M, 
~ Vix w) 

PV{x, w) < V{x, w) - a ^ ' \ < V{x, w) - cy"(x, w). □ 

{V{x, w) — 1) 

The type of drift in Corollary [28] implies polynomial rates of convergence; see 
for example We notice that the result suggests that the pseudo- marginal 

algorithm may have a similar rate of convergence as that of the marginal algorithm. 
This is in contrast with the situation where the marginal algorithm is geometric 
and the weights unbounded. 
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7. Polynomial ergodicity with a RWM as marginal algorithm 

We consider next conditions to check a polynomial drift condition for the 
pseudo-marginal algorithm in the case where the marginal algorithm is a geometri- 
cally ergodic random- walk Metropolis (RWM), which targets a super-exponentially 
decaying target with regular contours [l3|. The existence of such a drift, together 
with additional simple assumptions, implies polynomial rates of ergodicity, but 
also Condition [T3] (essential for the convergence of the pseudo-marginal asymp- 
totic variance to that of the marginal algorithm) and a central limit theorem for 
example. 

Our results rely on moment conditions on the distributions Qx{dw). In Section 
17.11 we assume the moments to be (essentially) uniform in x, while in Section [72] 
we consider the case where the behaviour of Qx{dw) can get worse as \x\ — )■ oo. 
It is possible to extend our results beyond the polynomial case. For example one 
may assume the existence of exponential moment conditions; see Remark [33 For 
the sake of clarity and brevity, we have opted to detail only the polynomial case 
here. 

Throughout this section, we denote the regions of almost sure acceptance and 
possible rejection for the marginal and pseudo-marginal and algorithms as 

^.:=(.GX : ^^^^>lj, Rx:=Al 
I 7r(x) J 

A,,^ := \{z,u) G X X W : ![(^_Lf)^ > il r ,= aI^, 

respectively, for all x G X and w G W. 

7.1. Uniform moment bounds. Consider the following moment condition on 
the distributions {Qx}xex where X = M'^. 

Condition 29. Suppose there exist constants a' > and /3' > 1 such that 
(22) Mw ■■= ess sup / (w""' V w^')Qx{dw) < oo, 

xGX J 

where a\/ b := max{a, b} and the essential supremum is taken with respect to the 
Lebesgue measure. 

We first establish the following simple lemma, used throughout this section, 
which guarantees that the moment condition above holds also for any intermediate 
exponents. 

Lemma 30. Given f l22|) . then for all a G [0,a'] and (5 G [0,/3'] and any 7 G 
[-«',/?] 

esssup / {w~^ y w^)Qx{dw) < Myy and esssup / w'^Qx{dw) < M\y. 
xex J xex J 

Proof. The first inequality follows by observing that w~°'\/ < w~"' M w^' for all 
w > 0. For the second one, suppose first that 7 G [0, /?']. Then, w"' < V w'^, 
and the result follows from the first inequality. The case 7 G [—a', 0] is similar. □ 



The following condition for the target density vr was introduced in [13 
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Condition 31. The target distribution tt has a density with respect to the 
Lebesgue measure (also denoted tt) which is continuously differentiable and sup- 
ported on Mf^. The tails of vr are super-exponentially decaying and have regular 
contours, that is, 

X X V7r(x) 

lim — ■ Vlog7r(x) = — oo and limsup - — - ■ < 0, 

\x\^oo \x\ l^.|^oo fI |V7r(a:)| 

respectively, where |x| denotes the Euclidean norm of x G Mf^. Moreover, the 
proposal distribution satisfies q{x, A) = q{A—x) = q{y—x)dy with a symmetric 
density q bounded away from zero in some neighbourhood of the origin. 

The following theorem establishes a polynomial drift given the conditions above. 

Theorem 32. Suppose P is a pseudo-marginal kernel with distributions Q^^dw) 
satisfying Condition\^with some constants a' > and P' > 1, and that the cor- 
responding marginal algorithm is a random walk Metropolis with invariant density 
71 and proposal density q satisfying Condition EH 
Define : X x W — )■ [1, oo) as follows 

(23) V{x,w) := c2.'n'~^{x){w~" y w^) where := sup 7r(2;), 

for some constants rj G (0, a' A 1), a G (?7, a'] and (3 G (0, (3' — rf). 

Then, there exists constants w, M, 6 G [1, oo), w G (0, 1] and 6v > such that 



(24) PV{x,w)< 



/3-1 

V{x,w) — 6vV " {x,w), for all {x,w) ^ C, 
6, for all (x, w) G C, 



where C := {(x, u;) G X x W : |x| < M, w G [w^, u^]}. 

Moreover, b, 6v and C depend only on the marginal algorithm, the constants 
a',P' and Mw in Condition\2^ and the chosen constants a,P,r]. 

Proof. Let w G [l,oo) and 6y > he as in Lemma [36| so that PV{x,w) < 

y(x, w) — SyV f (x, w) for all X G X and all w > w. Then, apply Lemma 1371 with 
the fixed value of w to obtain a M G [1, oo) and A G [0, 1) such that 

(25) PV{x, w) < A1/(x, w) = V{x, w)-{l- \)V{x, w), 

for all w G {0,w] and |x| > M. Lemma 138) implies that fl25|) holds with all x G X 
and w G {0,w\, with some A' G [0,1). Because V > 1, we conclude the claim 
for {x,w) ^ C with 6v '■= min{(5y, 1 — A, 1 — A'}. Lemma [38] implies the case 
(x, w) G C. 

The dependence on b, 6v and C is clear from the proofs of Lemmas [TTHSSI □ 

Remark 33. It is possible to generalise Theorem |32] for non-polynomial moments. 
Particularly, we may let V{x,w) = c^7i~^{x)(j){w) where (p : (0, oo) — )■ [1, oo) is 
defined by 

a{w), w G (0, 1] 
b{w), w G (1, oo) 

with non-increasing a : (0,1] — [l,oo) and non-decreasing b : (1, oo) — ?■ [l,oo) 
satisfying 

lim w~^a{w) = oo and lim b{w)/w = oo, 

10— >0+ w—>oo 
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and for some J > rj 

pi POO 

ess sup / a{w)Qx{dw) < oo and ess sup / b{w)w'^Qx{dw) < oo. 

x£X Jo agX Jl 

Note that a{w) and b{w) must grow at least polynomially as w — >■ 0+ and w oo, 
respectively. For example b{w) = exp(cbw) allows one to establish the claim with 
the stronger drift condition 

PV{x, w) < V{x, w) - 6v . ^^^'"^^ , {x, w) i C, 

log oV[x, w) 

instead of the polynomial drift in (123|) . 

Remark 34. We believe that the negative moment condition and the presence 
of in the drift function are not necessary in order to establish polynomial 
ergodicity in general. It seems, however, difficult to establish a one-step drift 
condition without any control of the behaviour of the distributions near zero. 

We first consider a simple result which is auxiliary to the other lemmas. 

Lemma 35. We have the following bounds for all x, z ^ X, w > 0, a > and 
/3 > 1. 

(i) / min|l,-|Q,(dM) > -fl - J- / u^Qxidu)] 

(n) / QxU^u) i-T^^ f w"°Q.-+.(d^). 

J{u:{z,u)eA^^^} \7r{x + z)J J 

Proof. The bound ([I]) follows by writing 

/min{l,-|g,(dn) = -fl- / (u - w)Qxidu)) > - ( 1 - [ nQ,.(dn)Y 

and using the estimate l{u> w} < {u/w)^~^. For (jn]), similarly 
/ Qx+z{du) = l- Qx+z{du) 

J {u:{z,u)eAa:,w} J \^U<W -^^^^^ j 

and use l\u < w-p4^} < u~"(w D 

I tt(x+z) i — V tt{x+z)J 

We next consider the case where w is large, and establish a polynomial drift in 
this case. 

Lemma 36. Suppose the conditions of Theorem\U^hold. Then, there exist con- 
stants 5y > and tZ; G [1, oo) such that 

fj~i 

PV{x, w) < V{x, w) — 6vV (x, w) for all x and w G [w, oo) . 



Proof. We may write for w > w > 1 

// ax,n,{z,u)Qx+z{du)q{dz)+ j j bx^nj{z,u)Qx+z{du)q{dz) 



PV{x,w) 
V{x, w) 
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where 

, ^ f vr(x) \''m~"Vm^ 

26 a^,^ := — — ■ — - 

\7r(x + z) J njP 

7r(x + z)\^"'' n^"" Vm^+^ / 7t(x + z) u 



(27) K,Uz,u) := \ ^ — + 1 

\ 7r(x) / W^+P \ 'IT[X) w 

We now estimate both integrals by partitioning their integration domains into 
their intersections with the acceptance and the rejection sets of the marginal 
algorithm. For notational simplicity we denote Ax^w r\ Rx = A^^w H {Rx x W) etc. 
The bound for the first integral is straightforward, 

// ax,w{z,u)Qx+z{du)q{dz) < ^ 



For the second one, observe that 1 < (~^~) ~) on Ax^w, implying 
ax.wiz, u)Qx+z{du)q{dz) 



- ^ /I .nn. " ^ ^ 



because P + t] < 13' . Similarly, because {^^^^^j^-^Y < 1 on Rx^w we have 



o J JrLx,w 

We now turn to the the crucial remainder, which approaches unity as w grows. 



_^ 7r(x + z) u 
7r(x) w 



Qx+z{du)q{dz) 



< 1 - //mill {l. iumjl. 0.+.(d")'((dz) 

<1--/ (l-^),(d.), 

77(3;) — J 

by Lemma [33 ([1]), where u G (0, 1). Lemma ESI dH]) in Appendix [D] implies the 
existence oi a u > such that inf^^gx ^'({-z : ^^^^^} > i^) > 0. Therefore, there 
exists a 1^2 E (0, u), such that whenever w is sufficiently large 

7l(x + z) u\ ^ / , N / , N 

1 - , , - Qx+z{du)qidz) < 1 - -. 



Because /9 > 1 , the terms of the order w ^ ot w ^ ^ vanish faster than w ^ when 
w increases. Consequently, we have for any G (0, z/2), by further assuming w 
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sufficiently large, that 

PV(x,w) < fl - — ) V(x,w) 
\ w / 

= V{x,w) - U3V''{x,w){c^7i~''{x)Y~'^ < V{x,w) - U3V''{x,w), 

where K = ^ e (0, 1). □ 

Next we deduce that in the regime where w is bounded, we have a geometric 
drift. 

Lemma 37. Assume the conditions of Theorem\3^hold and letw G [1, oo). Then, 
there exist constants X G [0, 1) and M G [1, oo) such that 

PV{x, w) < XV{x, w) for all w G (0, iv], \x\ > M. 

Proof. We may write 

where 

28 a^^^[z,u) := [——-] -1 

\7r(x + z) J w y wP 



a^^w{z,u)Q^+^{du)q{dz) + jl b^^y,{z,u)Q^+^{du)q{dz), 



(29) K,w{z,u) :-- 



nix + z)\^ ^ u 



tt{x) J w 



u y f-Kix + z^^^ 



Fix a constant c > 1 and define the following subsets A^^ := {z : ^'^^^^'^ > c} 

and := {z : < \}, and the annulus between these two sets as '■= 

\C _ . 1 ^ -k{x+z) 



{A^ U R^f = {z ■.\< ^ < c}. Comput 



e 



(30) / / a^^y,{z,u)Q^+^{du)q{dz) 



w-" V wP J 



and 



(31) / / h^^yj{z,u)Q:^+:,{(lu)q{dz) 

J Dx J (z,u)£Rx,w 

Let then 7 G (?7, a A 1) such that 7 + /3 < and observe that < 1 

on Rx,w, and thereby 



(32) / / bx^^{z,u)Qx+z{du)q{dz) 

' Rx J(z,u)£Rx,w 

< I I (^±±AY' ^'-^^ZQ..zidu)qidz) 

- jRJiz,u)^Rx,A } W^-^yw^+P"^'^'^ '''^ ' 
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Similarly, observe that { ^'^^^^^ < 1 on A^^w and so 



(33) /_ / a^^^{z,u)Q^+:,{du)q{dz) 

J Rx J {z,u)&Ax,-w 



It holds that 1 < ) on i?^.^, so we have 

(34) /_ / h^^y,{z,u)Qx+z{du)q{dz) 



'Ax J {z,u)eRx 

We are left with the term that will yield the geometric drift when |x| is large, 



ax,w{^, u)Qx+z{du)q{dz) 
< - / Qidz) [ Q.+.(dn) 

W '^VW^ JX^ J{u:{z,u)eAxM 

by Lemma [35] dH]) . Lemma 153) f pij) implies that 6 := liminf |2,.|^oo Q'(^x) > 0. 
Let 6' G (0, 6) and fix e > sufficiently small so that 6e — 5(1 — e)^ < —6', 

and let c > 1 be sufficiently large so that Mwc~'^ < e and My/i^^Y ^ ^i and 
also that all f p2|) . f l33|) and f lM|) are bounded by e. Condition [31] implies that 
limsup|^l^o^ g(i5a.) = 0, and therefore there exists M = M(c, e) > such that 
f[30|) + f[3T]) < e for all |x| > M. By possibly increasing the bound M to ensure 
that q{Arc) > 5(1 — e), we have that the claim holds for all > M with the 
constant X = 1 — 6'. □ 

We complete the results above by considering in particular very small values of 

w. 

Lemma 38. Suppose the conditions of Theorem\3^ hold, and let tv, M E [1, oo). 
Then, there exist constants w G (0, 1), A G (0, 1) and b G [1, oo) such that 

(35) PV{x,w) < b, for \x\ < M and w G [w,w] 

(36) PV{x,w) < XV{x,w), for X E X and w E {0,w\. 
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Proof. From the proof of Lemma [371 we have 
PV(x,w) 



- \ II Qx+zidu)q{dz) I +ax,w + bx,w, where 




7i{x) Y V 



tt(x + z)J w~°' V 
n(x + z) \ u u^" V u 



(5a;+2(du)g(d2;) 

Qx+z{du)q{dz). 



7r{x) J ww^" y 
Because i^^.^Y' < 1 on and (l^itV-'" < 1 on i?,, 

V 7r(x-|-2) uj — \ Tr(x) w J — 

u^-^yu'^+P „ M, 



-'x,w 



This is enough to show that PV{x, w) < {1 + Mw)V{x, w) for all (x, w) G X x W. 
Because V is bounded on {|x| < M,w G [w, w]}, this implies the existence of 
b = b{w, w, M) < CO such that (135|) holds. 

Consider then (136|) . Let 5 > be small enough so that inf^-gx Q'(^x) > e > 0, 
where Al^ := : '^^^'^^^ > ^j. Then, 

/ / Qx+z{du)q{dz) > / q{dz) / Qa;+2(dM) 

>/^^,(d.)(i-M„.(|)"')>i 

for G (0, ty] if i/7 is small enough. We may further decrease w to ensure that 

Ox.-u; + bx.w 

< e/4 for all w G (0, w] and conclude ([36]) with A = 1 - e/4. □ 

7.2. Non-uniform moment bounds. We replace the uniform moments in Con- 
dition [2n] here with the following assumption, which allows the moments of the 
distributions {Qx}x€X to grow in the tails of n. 

Condition 39. Let w : X — t- [1, oo) be a function bounded on compact sets and 
tending to infinity as |x| — oo. Let ip : (0, oo) — )■ [1, oo) be a non-increasing 
function such that '?/;(t) — > oo as t — > 0, and define g{x) := iP{tt{x)). 

(i) There exist constants a' > and /3' > 1 such that 



esssupf? (x) u " y u'^ Qxidu) < 1, 
xex J 

where the essential supremum is taken with respect to the Lebesgue measure, 
(ii) There exist constants G (0, — 1) and G (0, /?' — 1 — S,w), 



(37) sup ^, . . sup 

a;6X W^-[X) ^g/J^ 



7r(x + z)\^'' g{x + z) 



< oo. 



where Rx '■= {z : ^^^(^^^ < 1} is the set of possible rejection for the marginal 
random-walk Metropolis algorithm. 
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(iii) For any constant 6 > 1, one must have 

Mw{h{\x\yi)) 

(38) sup \ < oo. 

where Mw '■ (0, oo) — )■ (0, oo) is defined as follows 

^w{f) :=esssup / u~°' \/ Qx{du) < ess sup (7 (x), 

x|<r' J \x\<r 

where the essential supremum is taken with respect to the Lebesgue measure. 

Condition [5^ may appear rather implicit and technical at first. However they, 
together with additional assumptions required in Theorem HD] below, are implied 
by the more meaningful assumptions in Condition |4T] and Corollary |42l whose 
proof may help the reader gain some intuition. 

Theorem 40. Suppose P is a pseudo-marginal kernel corresponding to a random 
walk Metropolis with invariant density vr and increment proposal density q satis- 
fying Condition\3^ Suppose Condition\3^ holds with some a' > and > 1. 
Define V : X x W — [1, 00) as in (l23i) . where the constant exponents satisfy 

(0,a'A(/3'-l-e«,) A(l-^,)), a'], /? G (1 + - r/, /?' - r/) 

andri < (/?' - /3) A 1 - 

Furthermore, suppose that there exists a function c : X — )■ [l,oo) hounded on 
compact sets such that limsup|2.|^(,o c(a;)e~'* < 00 and 

w^^ (x^ 

(39) limsup ^} = where e (0, [(/?' - /3) A a A 1] - - ^^), 
and that for any constant 6 G [1, 00) 

I \ ^ wix] 

(40) limsup Mw{b\x\) max < q{Dx 



c^(x) ' \ c(x) 

where := {z : < < c{x)} . 

Then, there exist constants w,M,b G [l,oo), w G (0,1] and Sy > such that 
the polynomial drift inequality ([M]) holds. Furthermore, the constants depend only 
on those of the marginal algorithm, the quantities a', l3',^y^,^T^, ip, w involved in 
Condition\^ including the upper hounds in (I57|) and (155]) (as a function ofh), 
the chosen rj, a, P, c and C,c, o-nd the upper hounds ( 13 9 p and ( I40p . 

Proof. The proof follows by applying Lemma H3] below and then Lemma HH with 
from Lemma B5| similarly to the proof of Theorem I5^by setting W '. — SUP|2,|<jy./ if(x), 
and observing that V is bounded on C . The dependence on the various quantities 
is clear from the proofs of Lemmas |43] and HH □ 

Before proving Lemmas |13] and HU we give sufficient conditions to establish the 
conditions of Theorem HHl 

Condition 41. Suppose Condition [3T] holds and additionally there exists a con- 
stant p > 1 such that 

X 

lim ■; — — ■ VlogTrfx) = —00. 

IxKoo \x\P 
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Moreover, the increment proposal density q satisfies q{x) < q{\x\) for some bounded 
differentiable non-increasing function q : [0, oo) — j- [0, oo) such that ^y^q{\x\)dx < 
oo. 



Corollary 42. Suppose Condition^^is satisfied, and that 

(41) j V u^'QMu) < c(l V Ixl)"' 

with some constants c < oo and p' G [0,p — 1). Then, for any 

7/ G (0,a' A (/?'- 1) A 1), aeir],a'], /3 G (1 - r/, - r/) 

and V defined in fl23|) . the drift inequality f l24|) holds, with constants w,M,b G 
[l,oo), w G (0,1] and 6v > only depending on the marginal algorithm and 
a',P',c,p' in f HT]) and the chosen a,P andrj. 

Proof. Choose the constants C,w and sufficiently small so that the conditions on 
rj, a and /3 in Theorem HOl are satisfied. 

Fix a unit vector u EMf^ and define the function ip : — [1, oo) such that 



r>Ro 
r G [0,i?o) 



where Rq G [1,oo); this is always possible because the function r i-> Ti{ru) is 
bounded away from zero on compact sets and monotone decreasing on the tail. 

Define then g{x) = Cgip^ (7r(x)), where the value of the constant Cg > 1 will be 
fixed later. In order to guarantee that Condition is satisfied for sufficiently 

large Cg, it is sufficient to show that 

(42) limsup (7"^(x)c|x|'' < oo. 

\x\—^oo 

Due to Lemma [52] in Appendix [D| if |x| is sufficiently large, then g{x) = g{(x\x\u) 
for some G [b^^,b], where b G [l,oo) is a constant. Therefore, g~^{x) < 
{b-^\x\)-p\ implying (|i2D. 

Define then w{x) := (^''"(x), where = V G (1, oo). It is easy to check 
similarly to fH21) that 

g{x) , Mw{bi\x\yl)) ^ c'{b\x\y' ^ 

sup , . + -TTT-^ < 1 + sup < oo. 

xex [X) w^^ [X) xex w^^" [x) 

It is also easy to check that 

' 7r{x + z)Y'' g{x + z)] \ f 7r{x + z)^'' f tjj{7r{x + z)) 



sup 



tt{x) J g{x) 



sup 



7r(x) J V ip{7i{x)) 



is uniformly bounded in a; G X. This is because it is sufficient to check the 
condition in the tails along a ray, that is, only for z = r\x\, r > 1. We conclude 
about the existence of a constant Cg G [1, oo) such that Condition 139) holds. 

Choose ec G (0,p — 1 — p') and let c{x) = exp(|a;|'^'=). It is easy to check that 
there exists such that fl5^ and fHUl) hold, using Lemma [Ml in Appendix [D] to 
estimate q{Dj.). □ 

We start by establishing a polynomial drift when w is large. 
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Lemma 43. Suppose the conditions of Theorem\4^hold. Then, there exist con- 
stants G [1, oo) and 5y > such that letting w{x) := Cww{x), 



/3-1 



PV{x,w)<V{x,w) — SvVi^ {x,w) for all X E M!^ and w G [w{x), oo) . 
Proof. We may write 



PV{x,w) 
V{x, w) 



where a^^^ and b^^w are defined in (12^ and fl27|) . respectively. 

In what follows, for any z/ > 0, we will denote by b,y G (0, oo) a constant chosen 
so that for all x G X, {x + ^ : ^^^^ > J^} C B{0,b^{\x\ V 1)); see Lemma [53] (0) 
in Appendix ini We also denote by c G [1, oo) a constant whose value may change 
upon each appearance. 

For the first integral, note that on A^^w, 1 ^ i ^^wix)^ wY ^ denoting 5 : = 
Tj + /3 — 1 — > 0, we have for w > w{x), 



IL 



ax,w{^,u)Q^+,{du)q{dz) < // Qx+,{du)q{dz) 



< 



w 



1+5 



M^(6i(|x| Vl)) ^ ^ c 



w^^ (x) 



w 



1+5' 



by Condition [39] dm]). For the second one, let 7 G {rj + ^t,, f3' — f3], 7 < 1, and 
observe that 1 < {^^^^^^y-^Y on Ax^ui, implying that with 5' := 7 + /3 — 1 — .^^ > 



ax,w{z, u)Qx+z{du)q{dz) 



-^x,w^Rx 



< 



< 



n{x + z) 



1+5' 



W 



Tx{x) ) w'y+i^ 

7i(x + 2) \ g(x + z) 



Qx+z{du)q{dz) 



Rx 



TX[X] 



w^^ (x) 



q{dz) < 



w 



1+5' 



whenever w > w{x), by Condit ion 1391 fjl|l and (|ii|). Similarly, because (^^^^^:^)^ < 
1 on Rx^w we have for w > w{x) 



Rx.w^Rx 



TTiX) 



< 



W 



1+13 



Qx+z{du)q{dz) 



tt[x + z)\ g[x + z 



TT X 



9{x) . , c 
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and similarly, because {^^^^y^-^Y < 1; 



^ 1 / M,4.(&i(|x| Vl)) \ c 



As in the proof of Lemma [361 ^6 may apply Lemma [35] ([i]) to obtain 
n{x + z) u 



< 1 



7r(x) w 

V 



77(3;) J 



Qx+zidu)q{dz) 



/■ ..J, 1 /Mv^(6^(|a;| V 1)) 



<1--/ _ g(d2) 1 



<!-- [ , , g(dz)fl 



^ 7r(xl — J 



ir(a;) 

where we may choose u G (0, 1) such that mix^xQ{z '■ ^^^^^ > > 0; Lemma 



n]) ensures the existence of such a i/. 
The terms of the order i£i~(^+<^) or vanish faster than as increases. 

Consequently, we can choose G [1,cxd) sufficiently large so that there exists a 
i^' > such that for all x G X and w > w(x), 



PV{x,w) < 1 V{x,w 



w 



= V{x,w)-5vV''{x,w){cy-'^{x)Y " < V{x,w)-5vV^{x,w), 

where K = ^ G (0,1). □ 

Our last lemma concentrates on the cases where either is large and w 
bounded, or w is small. 

Lemma 44. Assume the conditions of Theorem[JU\ hold and let w{x) := c^w{x) 
for some constant G [1, 00). Then, there exist constants A G (0, 1), G (0, 1), 
M G [1, 00) and Cy G [1, 00) such that 

(43) PV{x,w) < \V{x,w) for \x\ > M,w E (w,w{x)] 

(44) PV{x, w) < XV{x, w) for X eX,w e (0, w] 

(45) PV{x, w) < cvV{x, w) for {x, w) G X x W. 
Proof. We may write 

PV(x,w) ff f f 

where dx,w and b^^w are given as in f l25]) and 
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Define the subsets := {z : ^^^^^ > c(x)}, Rx ■= {z : ^^^^^ < ^;(^} and 
Dx := {A,UR,)^ = : < ^Igil < c{x)}. Lemma El in Appendix |D] 

implies the existence of 61 G [1, C)o) and Mq G [1, C)o) such that A^. U + x C 
B(0, bi{\x\ V 1)) for all x G X. We decompose the two sums above into sub-sums 
on Ax and Rx, with again an obvious abuse of notation. 

Observe that 1 < (^^)' on Ax,^ and {^^Y'" < 1 on i?,,^, implying 

(46) // ax,w{z,u)Qx+z{du)q{dz) + bx,w{z,u)Qx+z{du)q{dz) 

< / / -^^Qx+z{du)q{dz) 

^ M^{h^{\x\y\))q{Dx) 

because ?7 < (/3' — /3) A a. 

Let then 7 := r/+{^+^c < (/3'-/3)AaAl and notice again that (^Jgf^^)^"^ < 1 



on Rx,w and { ^(^^^y ^^ < 1 on Ax^w Therefore, 

/ /_ dx,w{z, u)Q x+z{du)q{dz) + I bx,n,{^,u)Qx+z{du)q{dz) 

~ JrA J J w^'-^yw^+P^''^'^ '^^ ' 

^ 1 f w^^{x) \ r [ / 7r{x + z) y^ g{x + z) ^ g{x) 

~ w^-"'y w^y+f^yc^-ix) J JrA\ vr(x) J g{x) 

because ^^^^^ < c~^{x) on Rx- 

It holds that 1 < (^^^f ) on i?^,^, so we have 




hx,u,{z,u)Qx+z{du)q{dz) 

tt(x) Y f u-"yu^ 



< / ( ' ^ / „ Qx+.{du)q{dz) 

' 7r{x + z)J J(^z,u)eR:.,^ w ^ywP 



^ Mw{hi{\x\y l))c-'\x) 



Similarly, 




ax,w{z,u)Qx+z{du)q{dz) 



Ax J {z,u)eAx^u 



1/ yla; Pl-Aa; ij; 
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Now, by Lemma 



Q,+,(du)g(dz) > / ( 1 - ( ) / n-"'Q,+,(dn) ]q{dz) 



MH.(6i(|x|Vl))cS'f^ 



cix) 



for all w e (0, Cu;w{x)]. 

Lemma [531 fjml) in Appendix [Pl implies that 6 := liminf|a,|^oo ^(yla;) > 0. Con- 
dition |39] together with fl39|) and f l40|) imply 

(47) hmsup ^ ' <l-S, 

\cc\^oo V{X,W) 

and we may conclude (jlS]), by choosing any AG (1 — 5, 1) and finding a sufficiently 
large M G [1, oo) such that the claim holds. 

Consider then (144|) and assume < M. It is easy to verify that (147|) holds with 
some (5' > when taking limsup^^Q^ in the terms of the earlier decomposition. 
Finally, it is easy to check that (145|) holds for |x| < M similarly as ( H6|) . and the 
general case follows from (143|) and Lemma |43l □ 

8. Concluding remarks 

Our convergence rate results in Sections [3] and [5H7] allow one to establish central 
limit theorems. In the case where the pseudo- marginal kernel is geometrically 
ergodic, that is, P admits a non-zero spectral gap as discussed in Section [3], the 
central limit theorem (CLT) holds for all functions / : X x W — )■ M such that 
^(/^) < [26l Corollary 2.1]. Specifically, we have for all (7 : X — t- M with 

^ n—l 

(48) J2 [di^k) - Trig)] Ar(0, var(^, P)) in distribution, 

where var((7, P) G [0, 00) is given in Definition [5l It is possible to deduce up- 
per bounds for the asymptotic variance var((7,P). Namely, Corollary ITO] relates 
var((7, P) to var((7, P), and from Lemma 1471 fl5T]) . 

1 + (1 - Gap(P)) /• 2 - Gap(P) 

- 1 - (1 - Gap(P)) J ^--(^)'^(^^) = Gap(P) 
If the spectral gap of the marginal algorithm is not directly accessible, it can 



be bounded by the drift constants; see [6[ and references therein, and also |16l . 
Theorem 4.2 (ii)]. 

When P is polynomially ergodic, the class of functions g for which the CLT 
holds is related to the exponent in the polynomial drift. For the convenience 



of the reader, we reformulate here a result due to Jarner and Roberts 14 



Theorem 45. Suppose P is irreducible and aperiodic. Assume there exists V : 



X X W [1,00), a G [0,1), h G [0, cx)), c G (0,oo), a petite set [e.g. [ij, \2i] 
C G B{X) X i3(W) such that 

(49) PV{x, w) < V{x, w) - cVix, w) + bl {{x, w) E C} , 
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and that there exists rj E [1 — a,l] with 7r(\^^'') < oo and 

\9{x)\ ^ 
sup -— — — < oo, 

then vaT{g, P) e [0, oo) and the CLT (gH]) holds. 

Theorem 145) is a restatement of [3, Theorem 4.2], because the pseudo- marginal 
kernel P is also irreducible and aperiodic if the marginal kernel P is. The asyrnp- 
totic variance can also be upper bounded in the polynomial case; see [3] and 16 



Theorem 5.2 (ii) and Remark 5.3]. It is also possible to deduce non-asymptotic 
mean square error bounds fl6| . 

Finally some of our results apply directly to extensions of pseudo-marginal al- 
gorithms which directly make use of noisy estimates of the marginal's acceptance 



ratio [15|, 122[. However despite some similitudes and simplifications, the corre- 
sponding processes differ fundamentally in that {X^j^gN is a Markov chain in this 
case (as opposed to the pseudo-marginal scenario) and we are currently investi- 
gating these differences. 
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Appendix A. Lemmas for Section [2] 

In this section, (X, i3(X)) is a generic measurable space and /i is a probability 
measure on X. We consider the Hilbert space 

Lg(X,/i) := {/ : X ^ M : = 0, fx{f) < oo}, 

equipped with the inner product {f,g)^ '■= j.^f{x)g{x)fi{dx). We denote the 
corresponding norm by ||/||^ := (/, f)^J'^ and the operator norm for A : Ll{X, /i) — 
L^(X,/i) as \\A\\ :=sup{||A/t : ll/IU = !}• 

Lemma 46. Let Pi and P2 be two Markov kernels on space X reversible with 
respect to fi, and define the family of interpolated kernels H/s := Pi + (3{P2 — Pi) 
for P G [0, 1] also reversible with respect to fi. Then, 

A,{P) := (/ - XH^)-\I + XH,) = I + 2 ZZi >^'Hl 

is a well-defined operator on Lq(X, /i) for all A G [0, 1) and (5 G [0, 1] as well as 
the right-hand derivatives, with limits taken with respect to the operator norm, 

yl',(/3):= hm h-\Ax{f3 + h)-A)^{P))=2\{I-\Hpy\P2-Pi){I-\Hpy\ 
Al{P):= hm h-\A^{P + h)-A'^{P))=2\{I-\Hp)-\P2-Pi)A'^{P), 

for all \ G [0,1) and {3 & [0,1). 

Proof. The expression for Ax{P) follows by the Neumann series representation 
{I-XHp)-^ = J2T=oi^Hisf which is well-defined because UXHp)^\\ < XK Let us 
check that (3 Ax{(3) is right differentiable on [0, 1). Write for any /i G (0, 1 — /3) 

Ax{(3 + h)- Ax{^) = {I- XHp+h)-^X{Hp+h - H^) + Aa,/3,^(/ + XHf,) 
= Xh{I - XHpY\P2 - Pi) + Aa,/3,/.(/ + XHp) 
+ XhAxMP2-Pi), 

where A^,/?,/! = {I — XHf^^h)^^ ~ (-^ ~ XHp)^^ . The differentiability follows as soon 
as we show lim/i^o+ ^ ""^(Aa,/?,/!) exists. By the Neumann series representation, it 
is sufficient to show that limft_>.o+ h^^{H^_^f^ — H^) exists for all k > 0. The claim 
is trivial with k = 0, and the cases k > 1 follow inductively by writing 

Tjk Tjk Tjk-l(TT TT \ I ( Tjk-l Tjk — 1\TT 

= hHl\P, - Pi) + {H^^l - Hl')Hp 
+ h{Hl-l-H'f'){P,-Pi). 

Because (/ — XHi3)Ax{f3) = I + XHj3, we may write 

Xh{P^ - Pi) = (/ - XHp+h)Ax{P + h)-{I- XH^)Ax{P) 

= (/ - XHf,+f,) {Ax{f3 + h)- Ax{f3)) - Xh{P2 - Pi)A,(/3), 

from which, multiplying with and taking limit as /i — 0+, we obtain 

(50) A(P2 - Pi) = (/ - XH^)A'x{f3) - A(P2 - Pi)Aa(/3). 
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The desired expression for A'^^{(3) follows by observing that I + Ax{l3) = 2(/ — 
\Hp)~^. Consider then A'l^{[5). From fl50l) . we obtain 

(/ - \Hp)h-' + h)- A'^iP)) = A(P2 - Pi)A{P + h) 

+ A(P2 - Pi)h-^ (Aa(/3 + h)- A^iP)). 

We conclude by taking limits as /i — 0+. □ 

Lemma 47. Suppose Tl is a Markov kernel reversible with respect to fi, and 
{Xn)n>o is a Markov chain corresponding to the transition U with Xq ~ fi. Then, 
for a function f E L^iX, fi) 

(51) var(/,n) = hm -E^ = / l±^ey,n(dx) G [0,oo], 

n^oo n \ ^—^ J J 1 — X 



where e/^n is a positive measure on S C [—1, 1] satisfying Cf^iiiS) 

For any f G Ll{X,fi), whenever the series below is convergent, then the follow- 
ing equality holds, 

oo 

(52) var^(/) + 2 ^ E[/(Xo)/(Xfc)] = var(/, U) < oo. 

k=l 

Moreover, 

varA(/, n) := (/, (/ - \Uy\l + \U)f)^ G [0, oo) 
is well-defined for all A G [0,1), and satisfies limA_j.i_ var;)^(/, 11) = var(/, 11) and 

(/, (j-An)-V)>o. 

Proof. The reversibility of 11 ensures that 11 is a self-adjoint operator on Lq{X, fi) 
with a spectral radius bounded by one. Therefore, by the spectral decomposition 
theorem, there exists a positive measure ej^n on the Borel subsets of the spectrum 
S C [-1, 1] and such that (f^W'f) = J^x^ef^uidx) for all > [e.g.^, VII.2ff.]. 
Now, we may write 

2 , / n 



-E lY.f{X,)\ = -(nE[f\Xo)] + 2^5^ E[/(Xo) 

^ i=l ' ^ j=l j<i 

n 

(53) =a/). + -E(^-^)(/'nvx 



n 

k=l 



^ ^xMe/,n(da;). 



n 



Because x{l — x)~^ = '^if.^iX^ for all \x\ < 1, it is straightforward to verify 
by Kronecker's lemma that (15T|) holds. Similarly, whenever the sum in ( l52|l is 
convergent, it is easy to see that the term (153|) converges to (152|) . 
The expression for Ax{l) in Lemma 146) allows us to write 



var,(/,n) = (/,/)^ + 2 5^A'=(/,n'=/>^= / ( 1 + 2 ^^(Ax)'^ ) e;,n(dx) 

k=l ^ k=l 

= Y^e,,n(dx) + Y^e^Adx) G [0, oo) 
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We conclude that limA-)>i- varA(/, H) = var(/, 11) by the monotone convergence 
theorem. For the last claim, we use the Neumann series definition of (J — AII)"^, 

oo r \ 

(/, (/ - An)- V>, = E (/' nV>, = / r3w^/>n(d^) G [0, oo). □ 

fc=0 ^ 

Appendix B. Lemmas for Section [3] 

We include the following result for the sake of self-containedness; the idea of 
the proof was pointed out also in j^, Theorem A. 2]. 

Lemma 48. Let A and B he self-adjoint operators on a Hilbert space "H satisfying 
< (/, Af) < (/, Bf) for all f ETi, and the inverses A~^ and B~^ exist. Then, 
0<{f,B-'f)<{f,A-'f) for all fen. 

Proof. The claim follows easily as soon as we prove (/, A^^f) = sup^g^[2 {g, f) — 
{g,Ag)\. This identity follows from 

(/, A- V> - 2 {g, f) + {g, Ag) = (/ - Ag, A-'f) + {g, Ag - f) 

= {A-'f-g,AiA~'f-g))>0, 

and because the supremum is attained with g = A^^f. □ 

Lemma 49. Suppose P is a Metropolis-Hastings kernel given in and p{x) is 
given in ([2]). Then, the spectral gap of P defined in (|TOi) satisfies 

(i) for any set A e B{X) with tt{A) e (0, 1), 

Gap(P)< (l-7r(A))-'(l-infp(x)), 

(a) if IT does not have point masses, that is, 7r({x}) = for all x eX, then 

Gap(P) < 1 — p{x) for n-almost every x G X. 

Proof We first check @. Denote p = F{A) G (0, 1) and define f{x) = al{x e A}- 
bl {x ^ A} where the constants a, 6 G (0, oo) are chosen so that 7r(/) = ap — b{l — 
p) = and vr(/^) = a'^p + 5^(1 — p) = 1. We may compute 

^pif) = \ J 7r(dx)g(x, dy) mm{l,r{x,y)}[f{x) - f{y)f 

= {a + b)^ / 7r(dx) / q{x,dy)mm{l,r{x,y)} 
J A Ja^ 

< {a + bf j 7r(dx)(l - p{x)) < {a + bfp{l - inf p(x)). 

Now, according to our choice of a and b, 

(a + bfp = (1 - 6^(1 - p)) + 262(1 -p)j^h''p=l + b'' = {l- pY\ 

Consider then The case Gap(P) = is trivial, so assume Gap(P) > and 
assume the claim does not hold. Then, there exists an e > and a set A G -B(X) 
with p := F{A) G (0, 1) such that 1 - p{x) < Gap(P) - e for all x e A. From (ji]), 
Gap(P) < (1 — p)~^(Gap(P) — e). Because n is not concentrated on points, we 
may choose p as small as we want, which leads to a contradiction. □ 
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Lemma 50. Let a,b > 0. Then, 

(54) I min{l, ab} — min{l, a}\ < min{l, a}\l — b\. 

Proof. If a < 1 and ab < 1, a > 1 and a6 > 1, or either a = or 6 = 0, then flMj) 
is immediate. If < a < 1 and ab > 1, then b > a~^ > 1 and 

I min{l, ab} — min{l, a}\ = a\a^'^ — 1| < ci\b — 1| = min{l, a}|l — 6|. 

If a > 1 and ab < 1, then b < a^^ < 1 and fl54|) is established by 

I min{l, ab} — min{l, a}\ = 1 — ab <1 — b = min{l, a}|l — 6|. □ 

Appendix C. Lemmas for Section H] 

Lemma 51. Suppose X = (Xi, . . . , X„) and Y = (Yi, . . . , F„) are Markov chains 
on a common state space (X, -B(X)) with kernels P and Q, and initial distributions 
n and w, respectively, which are invariant such that ttP = n and wQ = vj. Then, 
the distributions of X and Y denoted as fix and /xy satisfy the following inequality 
for any C G B{X), 

Wf^x - /iy|| < Ik - w\\ +2{n - 1)tt{C^) + (n - l)sup ||P(a;, ■) - Q{x, ■)\\, 

xec 

where \\fix — yUy|| := sup|j|<;^ — yLty(/)| denotes the total variation. 

Proof. Let A G B{X). We shall use the shorthand notation x = Xi^n = {xi, . . . , x„) 
and denote gp'"'\x) = l{x G A} and 

9'i''''\xi.,k) ■■= J P{xk, dxk+i) ■ ■ ■ j P{xn-i, dxn)l {x E A} , 1 < k <n - 1, 

and gp'^^ := gp\ and define gg^ similarly using the kernel Q. 

Note that gp ■* and gg ^ take values between zero and one and the total variation 
satisfies ||7r — voW = 2supo<j<i |7r(/) — vc!{f) \ = 2sup^gg(x) l^(^) ~ 



Q J\ 

<lh-r:a\\ + \n{g'^^^-gi^\ 

showing the claim for n = 1. Assume then n > 2 and observe that we can write 
\7^{gp^ — gQ^)\ = |IE[5'p''(Ai) — (7q^(Ai)]|. We may continue inductively 



Hg^'-"-'^ - gt~'^)iXi:n^i)] 



E 



J A{Xn-l,dXn)gQ''^\Xi.,n-l,Xn 
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where A(x, dy) := P{x, dy) — Q{x, dy), and observe that 

E 



:n— 1 ) ■^n 



sup sup 



j A{Xn~l,dXn)gQ'''\xi:r^ 



-sup ||P(x, 



Oil, 



because | / A(X„_i, dx„)5'Q'^ (Xi;„,_i, x„,)| < 1 and < Qq'"' < 1 
Appendix D. Lemmas for Section [7] 



□ 



We denote by n{x) := x/\x\ the unit vector pointing in the direction of a; 7^ 
and by B{x,r) := {y E M.'^ : \x — y\ < r} the (closed) Euchdean baU. 

Lemma 52. Assume n satisfies Condition\3^ and that c : X — )■ [l,oo) satisfies 
hmsup|3,|_^j^ c(x)e~l'^l < 00. Then, there exist constants M,b E [l,oo) such that 



y e 



nix) 



cix) 



< c(a;)| C B{0,b\x\)\B{0,b-^\x\) for all \x\ > M. 



Proof. Let d > \imsup^^^^^c{x)e Choose any C G (4c', 00) and let Mq G 
[1 V logc', 00) be sufficiently large so that there exists a f3n E (0, 1] such that for 
all |x| > Mo, 

c(x) < c'e'"^', n{x) ■ Vlog7r(x) < —C and n{x) ■ n(V7r(x)) < — /Stt- 

Let 6 E (0, 1), then for any |x| > Mq{1 — 6)~^ and all z = tn{x) with |t| < 6, we 
have 

(55) 

vrix) 



log 



7l{x + z) 



\t\ I |n(x + A2) ■ Vlog7r(x + A^)|dA > C|t|. 



Now, if \x\ > aMo where a := exp(27rtan(arccos(/9,r))), then [281, Lemma 22] 
implies 

(56) {y G : n{y) = 7r{x)} C B{0, a\x\) \ B{0, a"V|). 

Take any M > AaMo, and choose |x| > M. Then, the condition (155|) implies that 
any z = Xx E D^, where A > satisfies 

|(A-l)|x|| <C-Hogc{x) < C-^(log(c') + |x|) <2C-^\x\. 

We deduce that |A — 1| < 1/2. Again, using f l56|) . we deduce that the claim holds 
with b = 2a. □ 

Lemma 53. Assume ir satisfies Conditionl31[ 
(i) Then, for any constant v E (0,oo), there exists a constant by E [l,oo) such 
that for allxEX, {x + z : ^^g^ > C B{0,by{\x\ V 1)). 
Assume also that q satisfies Condition l31\ 



(a) There exists a constant v E (0, 00) such that inf^-gx 



7r(a;+2) 
■k{x) 



> u}) > 0. 



(Hi) For any constant v E (0, 00) , there exists a constant M = M[u) E [l,oo) 



such that inf 



x\>M 



tt(x) 



> i^}) > 0. 



CONVERGENCE PROPERTIES OF PSEUDO-MARGINAL MGMC 39 

Proof. Consider first (ji]). The existence of such a finite constant follows for x in 
compact sets by the continuity of tt and in the tails by Lemma [52j 

The claim (jil]) follows on compact sets by the continuity of logTr, and in the 
tails as in [isl, proof of Theorem 4.3]; the last claim f lmj) follows similarly. □ 

When the target and the proposal distributions satisfy also Condition HH we 
have a decay rate for q{D^). 

Lemma 54. Assume Condition and assume limsup|^|^ooC(x)e~'^' < oo. 
Then, for any e' > there exists a constant Mq G [M, oo) such that for all 
\x\ > Mo 

\X\P~^ I C[X) 7T[x) 

Proof. Lemma 152] implies b G [l,C)o) and M' G [1, C)o) such that for all |a;| > M' 
the annulus C B{0,b\x\) \ B{0,b~^\x\). This implies that for any constant 
Q G [1, oo) one can choose Mi G [M', oo) such that 

n{x + 2;) ■ V log7r(x + z) < —Ci\x + zY~^ for all |x| > M^, z G D^. 

Denoting i[x) := log7r(x), we write 

= {z G M'^ : \i{x ^ z) - i{x)\ < logc(x)}. 

Define the contour surface set S^^) '■= G M"' : 7r(y) = 7r(x)} and 

C^ix)iS) := {y + tn{y) : y G S'^(x.), \t\ < 6} . 

We will now check that with our conditions, for |x| > M^fe, 

log c(x) 

(57) D^ + x C C^(x)(5z) where 5^ := , , ■ 

C£ \x\ 

Because D^ + x = Dy + y whenever tt{x) = vr(?/), it is sufficient to consider z G D^^ 
such that z = tn{x) As in the proof of Lemma [321 

\i [x + z) - £{x)\ = \t\ [ \n{x + Xz)-Ve{x + Xz)\dX 

Jo 

dt > Cib-'^P~'^Mx\P-^\t\. 



> \t\ce\x\f-^ [ 
Jo 







t 


f 


H 




10 




\x\ 



Now \£{x + z) — ^(x)| < logc(x) implies ([5 
Write then, by Fubini's theorem, 

q{D,) < I q{z)dz 

rm 

= / C''{zeW' ■ q{\z\)>t,zeC^^^){5^)-x)dt 
Jo 

POO 

= / £"'(2GM'^: \z\ <u,z e C^(^){5^) - x)\q{u)\du. 
Jo 

Now, (131 . proof of Theorem 4.1] shows that for u < \x\/2, 
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where Cd = C^{B{0, 1)). By polar integration, 

J/ES^Ca;) J \y\-&x. 

where the latter inequality holds for u > \x\/2. We obtain 



q{D,)<c'6jl+ / u''\q'{u)\du], 



oo 



and because q is monotone decreasing, integration by substitution yields 

rM I'M r 

/ u'^\q'{u)\du = d u'^~^q{u)du - M'^q{M) < dc^^ j q{x)dx < oo. 
Jo Jo J 

We deduce q{Dx) < c"6x, and conclude by choosing q sufficiently large. □ 
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